Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix parsing of numbers of Locales with "non-standard"-grouping and/or decimal separators #3778

Closed
wants to merge 6 commits into from

Conversation

kimmerin
Copy link
Contributor

@kimmerin kimmerin commented Feb 5, 2024

Hi,

as discussed in the forum, if numbers are formatted e.g. with the Swiss Locale, they can't be parsed. This happens for regular and limit price values. I've fixed this issue by creating a regex pattern using the actual values from DecimalFormatSymbols and refactored the classes a bit to be able to put tests around them

getTextColor to find a useful color in dependence of the background.
Changed the threshold of getTextColor because "pure" green is way to
bright for a white foreground color to be readable
@kimmerin kimmerin changed the title Fix parsing of nubers of Locales with "non-standard"-grouping and/or decimal separators Fix parsing of numbers of Locales with "non-standard"-grouping and/or decimal separators Feb 5, 2024
Copy link
Member

@buchen buchen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for tackling this problem.

BTW, it would greatly help me if you create a new branch for every pull request against the portfolio/master branch. This branch included commits from previous pull requests that I had merged in - but sometimes I rebase the commit, sometimes I slightly edit the commit. Therefore I had those left-overs here.

Comment on lines -28 to +48
private static final Pattern PATTERN = Pattern.compile("^([\\d.,-]*)$"); //$NON-NLS-1$
private static final Pattern LIMIT_PRICE_PATTERN = Pattern.compile("^\\s*(<=?|>=?)\\s*([0-9,.']+)$"); //$NON-NLS-1$
private static Pattern PATTERN;
private static Pattern LIMIT_PRICE_PATTERN;

static
{
initPatterns();
}

@SuppressWarnings("nls")
static void initPatterns()
{
DecimalFormatSymbols dfs = DecimalFormatSymbols.getInstance(Locale.getDefault(Locale.Category.FORMAT));

// e.g. \\-?(\\d+\\.)*\\d+(\\,\\d+)?
String numberRegex = "\\" + dfs.getMinusSign() + "?(\\d+\\" + dfs.getGroupingSeparator() + ")*\\d+(\\"
+ dfs.getDecimalSeparator() + "\\d+)?";
PATTERN = Pattern.compile("^\\s*" + numberRegex + "\\s*$");
LIMIT_PRICE_PATTERN = Pattern.compile("^\\s*(<=?|>=?)\\s*(" + numberRegex + ")\\s*$");
}
Copy link
Member

@buchen buchen Feb 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me try to understand the motivation for this change.

The current code first applies a pattern to check if it is kind of a normal number (and to find the indicator '>', '<', etc. for the limit price) but then parses the value the a DecimalFormat build from the current locale. If either of those fail, then an error is thrown.

To be honest, I can't remember why the code is the way it is. Because one could argue that the DecimalFormat is enough - if it parses the number, then it is a number (maybe check if there are no remaining characters left in the string).

Therefore, instead of making the pattern more complex, I wonder if we either get rid of the pattern at all or just add the grouping separator - say ' and a blank . Is your concern that there are unknown separators for obscure locales?

 private static final Pattern PATTERN = Pattern.compile("^([\\d.,-' ]*)$"); //$NON-NLS-1$

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My theory about this particular logic is that checking the value to be parsed against a regex makes sure that you don't successfully parse values as correct while they aren't correct, e.g. an english formatted 1,234.56 with a german Locale. Have to check but I think the result would be 1.23456. Fun fact: The original regex wouldn't catch this problem ;-)

Of course, if it can be removed alltogether, I can remove it but my motivation was to fix the issue without changing any preexisting logic.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Therefore, instead of making the pattern more complex, I wonder if we either get rid of the pattern at all or just add the grouping separator - say ' and a blank . Is your concern that there are unknown separators for obscure locales?

I'd say: Do a correct check (requiring the more complex regex I've created) or leave it away. But that's no my decision to make ;-)

Copy link
Contributor Author

@kimmerin kimmerin Feb 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be honest, I can't remember why the code is the way it is. Because one could argue that the DecimalFormat is enough

After further thinking about this: DecimalFormat would also successfully parse formats using the exp-syntax like 1.2345E2. The check against the regex prevents that from happening. Maybe that was the motivation "back then" and might be a reason in favor of keeping the check today.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe that was the motivation "back then" and might be a reason in favor of keeping the check today.

I am not sure what my motivation was back then. Maybe a naive thinking that number parsing can be easier...

just on example: our friends in Belgium use the dot as decimal separator even though that is not the standard separator in the locale. Why? Because the dot is easier to reach on the keyboard. Even Excel supports this - I checked it.

But anyway, I think your code is as good as it gets for parsing a locale-specific number based on the configured symbols.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The more I think about it, maybe we should use the StringToCurrencyConverter here as well. That would make the parsing consistent to other input fields.

Comment on lines -67 to +69
private static final String QUOTE_PATTERN = "#,##0.00######"; //$NON-NLS-1$
static final String QUOTE_PATTERN = "#,##0.00######"; //$NON-NLS-1$

private static final ThreadLocal<DecimalFormat> QUOTE_FORMAT = ThreadLocal // NOSONAR
static final ThreadLocal<DecimalFormat> QUOTE_FORMAT = ThreadLocal // NOSONAR
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do I understand it correct, that this change is made to facility testing? Because you need to change the locale, but the DecimalFormat in the ThreadLocal also needs to change? If so, then I think an explicit method is also okay. If we annotate it with @VisibleForTesting then the IDE even provides hints in case we accidentally want to use it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've made the change to be able to change the Locale during testing in order to replace the instance of DecimalFormat in QUOTE_FORMAT after it has changed. Concerning a method: I've implemented one in ValuesBuilder.initQuoteValuesDecimalFormat which is why I've changed the visibility of these two members in the first place. So I'm not sure if another method in non-test-code is necessary.

};
}

public static <T extends Number> Values<T> createNumberValues(String pattern, int precision)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this (test) method is not used anywhere in the code

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct, I've created it and then realized that the code that is involved in the parsing of monetary values didn't use the parse-methods in the Values-instance but above DecimalFormat in QUOTE_FORMAT. I've left the method in because it might come handy in future testcases where this might be needed.

@buchen buchen self-assigned this Feb 15, 2024
@buchen buchen added the bug label Feb 15, 2024
@kimmerin
Copy link
Contributor Author

Hi @buchen,

are there any open questions or issues with this PR that I need to address? Concerning the issue with the "left over" class of another PR, can this be solved from your side by ignoring that class when merging the changes into the code base? I'm not that git-savvy and am concerned that by trying to get this sorted out on my side, this will end in misery.

@buchen
Copy link
Member

buchen commented Mar 5, 2024

are there any open questions or issues with this PR that I need to address?

I was just busy with other things. I will pick it up now.

buchen added a commit that referenced this pull request Mar 5, 2024
The attribute implementation uses a pattern to check for valid numbers
before parsing the number. The pattern did not take local dependent
thousands and decimal separator into account.

With this change, the pattern is created based on the given locale.
To enable testing, the pattern is created when the converter is created.
Because converters are created only once per attribute, this should not
be a performance problem.

Issue: #3778
Co-authored-by: Lothar <github@kimmeringer.de>
Co-authored-by: Andreas Buchen <andreas.buchen@gmail.com>
@buchen
Copy link
Member

buchen commented Mar 5, 2024

Merged.

I decided to get rid of the static pattern creation. If it is not static, then the test can just instantiate a class and the pattern will use the right locale. I do not think that is a performance problem, because the AttributeType is only instantiated once and therefore the converter is only instantiated once. So it is not like we have now thousands of Pattern instantiations.

And I decided to not test the toString methods. Why? Because it makes the code really messy due to the static Values.*. Of course, today I would not use such pattern anymore, but that's the way it is. Plus: it couples the formatting with the definition of digits which also does not really make sense. Anyway, the tests are valuable because they ensure proper parsing.

Thanks for the change. And thanks for your patience. And thanks for the new test case - that made it much easier to do the change.

@buchen buchen closed this Mar 5, 2024
@kimmerin
Copy link
Contributor Author

kimmerin commented Mar 9, 2024

Thanks for the change. And thanks for your patience. And thanks for the new test case - that made it much easier to do the change.

No problem and that's what unit tests are for ;-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants